Bayesian Analysis


Lecture

September 13, 2023

Last time

  1. Parametric uncertainty matters for decision-making
  2. As we collect more data, fewer combinations of parameters are consistent with observations

Logistics preview

  • No class Monday
  • Today: focus on the methods and ideas
  • Monday, asynchronously, work through the code

Prior information

Today

  1. Prior information

  2. Bayesian Inference

  3. Sampling

  4. Example: storm surge distribution

  5. Alternative priors

  6. Wrapup

Rare disease

  1. Everyone is tested for CEVE543acitis, a rare and deadly disease
  2. It is known that 1 in 1,000 people have CEVE543acitis
  3. The test is 99% accurate
  4. Your test comes back positive. What is the probability that you have CEVE543acitis?

Bayes’ rule: discrete event version

\[ \Pr \left\{ \theta | y \right\} = \frac{\Pr \left\{ y | \theta \right\} \Pr \left\{ \theta \right\}}{\Pr \left\{ y \right\}} \]## Application: rare disease {.smaller .scrollable}

Define \(y\) is getting a positive test result and \(\theta\) is having the underlying condition. Not that we do not observe \(\theta\) directly! Here \(y=1\) and we want to know \(\Pr\left\{\theta = 1 \mid y=1 \right\}\).

Likelihood:

\(\Pr\left\{y = 1 \ldots\right.\) \(\Pr\left\{y = 0 |\ldots \right.\)
\(\left. ...\theta=1 \right\}\) 0.99 0.01
\(\left. ...\theta=0\right\}\) 0.01 0.99

. . . A naive application of maximum likelihood: \(\Pr\left\{y=1 \mid \theta=1 \right\} > \Pr\left\{y=1 \mid \theta=0 \right\}\) so best estimate is \(\theta=1\)

Solving

We are studying \(\Pr\left\{\theta = 1 | y = 1 \right\}\).

  1. Step 1: \(\Pr\left\{ y = 1 \right\}\)
    1. \(\Pr\left\{y=1\right\} = \Pr \left\{ y=1, \theta=0 \right\} + \Pr \left\{ y=1, \theta=1 \right\}\)
    2. \(\Pr\left\{y=1\right\} = \Pr \left\{ \theta=0 \right\} \Pr \left\{ y = 1 | \theta=0 \right\} + \Pr \left\{ \theta=1 \right\} \Pr \left\{ y=1 | \theta=0 \right\}\)
    3. \(\underbrace{\Pr\left\{y=1\right\}}_\text{test +} = \underbrace{0.999}_\text{don't have it} \times \overbrace{0.01}^\text{false +} + \underbrace{0.001}_\text{have it} \times \overbrace{0.99}^\text{true +}\)
  2. Now plug in to Bayes’ rule
    1. \(\Pr\left\{ \theta=1 | y=1 \right\} = \frac{\Pr\left\{y=1 | \theta=1 \right\} \Pr\left\{ \theta=1 \right\}}{\Pr\left\{y=1\right\}}\)
    2. \(\Pr\left\{\theta=1 | y=1 \right\} = \frac{0.99 \times 0.001}{\Pr\left\{y=1\right\}}\)

Implementation

0.01098
0.09016

Bayesian Inference

Today

  1. Prior information

  2. Bayesian Inference

  3. Sampling

  4. Example: storm surge distribution

  5. Alternative priors

  6. Wrapup

Key idea

  1. Parameters have probability distributions, not single point values
  2. Start with some prior distribution for parameters
  3. Goal: what is the distribution of the parameters given the data?

Bayes’ rule for distributions

\[ p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)} \]

If we are drawing samples from a distribution, we can calculate up to a constant of proportionality and – since \(p(y)\) doesn’t depend on \(\theta\) – we can usually ignore it.

\[ \overbrace{p(\theta \mid y)}^\rm{posterior} \propto \underbrace{p(y \mid \theta)}_\rm{likelihood} \overbrace{p(\theta)}^\rm{prior} \]## Coin flipping

We flip a coin a few times. We want to estimate the probability of heads so that we can make well-calibrated bets on future coin tosses.

8

Maximum likelihood

Maximum likelihood estimate (MLE) is the most likely value of \(\theta\) given the data. As before, we can use our log-likelihood.

  1. This builds on what we did last time. A coin flip is represented by a Bernoulli process. In fact, we could use a Binomial distribution to model the number of heads in \(N\) flips.
  2. The maximum likelihood estimate can in fact be shown to be exactly n_heads / length(coin_flips)

Prior

We should be suspicious of our analysis when it concludes that we will continue to see 8 out of 9 flips coming up heads forever.

To perform a Bayesian analysis, we’ll need a prior. A Beta distribution is a natural choice for a prior on a probability, although we could use a Uniform distribution or even something silly like a truncated Gamma (don’t!)

Closed-form solution

Cool property: if you have a Beta prior and a Binomial likelihood, the posterior is also Beta distributed. Look up Beta-Binomial conjugacy for more! We will leverage this property to check our answers.

Sampling

Today

  1. Prior information

  2. Bayesian Inference

  3. Sampling

  4. Example: storm surge distribution

  5. Alternative priors

  6. Wrapup

Markov Chain Monte Carlo

  1. A class of methods for sampling from a probability distribution
  2. Random walkers:
    1. Start at some value
    2. Propose a new value
    3. Accept or reject the new value based on some criteria
  3. Repeat to get a “chain” of samples

Metropolis-Hastings

See the very good Wikipedia article

Limitations

  1. Works great for a very simple problem
  2. Computation blows up in higher dimensions (p_accept gets very small)
  3. Have to code a new sampler for each problem

Modern samplers leverage gradients and clever tricks to draw better samples for harder problems. Let’s use them!

Turing model specification

We can write down the full Bayesian model in Turing, which uses a syntax very close to our notation!

Turing sampling

We can leverage sophisticated machinery for drawing samples from arbitrary posterior distributions. For now, we will trust that it is drawing samples from \(p(y | \theta)\) and not worry about the details.

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯
           θ    0.6832    0.1021    0.0016   4234.2533   5796.1544    1.0003   ⋯
                                                                1 column omitted

Visualize

We can visualize our posterior

histogram( coin_chain[:θ]; label=“Samples”, normalize=:pdf, legend=:topleft, xlabel=L”\(θ\)“, ylabel=L”\(p(θ | y)\)” ) plot!(closed_form; label=“Exact Posterior”, linewidth=3) plot!(prior_dist; label=“Prior”, linewidth=3) vline!([θ_mle]; label=“MLE”, linewidth=3) ```## Compromise

The posterior is a compromise between the prior and the likelihood.

  • Bad priors lead to bad inferences
  • The choice of prior is subjective, which some people hate,
    • We will approach this in a principled manner (Gelman et al., 2020; Gelman & Shalizi, 2013)
    • Lots of other steps are also subjective (choice of likelihood model, which data to use, problem framing, etc)
    • False sense of objectivity is dangerous anyways!

Example: storm surge distribution

Today

  1. Prior information

  2. Bayesian Inference

  3. Sampling

  4. Example: storm surge distribution

  5. Alternative priors

  6. Wrapup

Read data

Model

Define a LogNormal distribution with very diffuse (flat) priors

Sample

Posterior

We leverage the histogram2d function to visualize the 2D posterior distribution.

Return period with uncertainty

Each draw from the posterior represents a plausible value of \(\mu\) and \(\sigma\). We can use these to explore the distribution of return periods.

Trace plot

Visualize the samples as a chain

Alternative priors

Today

  1. Prior information

  2. Bayesian Inference

  3. Sampling

  4. Example: storm surge distribution

  5. Alternative priors

  6. Wrapup

Model

We can treat the priors as parameters so that we don’t have to define a new @model each time we want to update our priors

  1. No reason why we can’t pass distributions as functional arguments

Guess and prior predictive check

Define priors

Draw samples from the prior

Plot the consequences of these samples

Revise

If we are getting return levels of \(10^{12}\) ft, we should probably revise our priors

  1. Yes, I’m overwriting the old value

We can sample

Getting closer

Now get posterior

We use the same model to get the posterior. Often we want to run multiple chains with different initial values to make sure we are getting good samples.

Summary Statistics
  parameters      mean       std      mcse     ess_bulk     ess_tail      rhat     Symbol   Float64   Float64   Float64      Float64      Float64   Float64 ⋯
           μ    1.3692    0.0194    0.0001   17256.1692   13201.9364    1.0003 ⋯
           σ    0.1861    0.0138    0.0001   17431.4216   13529.7674    1.0001 ⋯
                                                                1 column omitted

Traceplot for multiple chains

Visualize

Note

Here our likelihood is very informative, so it doesn’t much matter if we use excessively diffuse priors. This is nice, though not something we can count on in general.

Return period with uncertainty

As before, we can visualize our posterior distribution in terms of return periods

Wrapup

Today

  1. Prior information

  2. Bayesian Inference

  3. Sampling

  4. Example: storm surge distribution

  5. Alternative priors

  6. Wrapup

Key value add of Bayesian inference

  1. Draw samples from tricky posteriors to compute expectations \(\mathbb{E}[f(\theta)]\)
  2. Quantify parametric uncertainty
    1. In practice, sometimes this is a big deal and sometimes model structure uncertainties matter more
  3. Force us to specify a data generating process
  4. Computational methods fail loudly

Learning Turing

The official docs are great.

Warning

Google will often try to link you to the old site, https://turing.ml/. This is out of date! Use https://turinglang.org/stable/ instead.

Another word on generative AI

Do not just plug in the problem and paste the solution!

  1. Labs are just 10% of your grade
  2. You will be resonsible for material on projects / tests
  3. You won’t learn

Do use it to for syntax help and code explanations

Logistics

  1. Friday:
    1. Lab 04
    2. Lab 03 due
  2. Monday 9/18: no class. Work on lab 04 and work through these slides.
  3. Tuesday 9/19: no office hours.
  4. Wednesday 9/20: test I review
  5. Friday 9/22: test I
  6. You may turn in lab 04 by 9/29 (two weeks)

References

Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/f4k2h4
Gelman, A., Vehtari, A., Simpson, D., Margossian, C. C., Carpenter, B., Yao, Y., et al. (2020, November 3). Bayesian workflow. https://doi.org/10.48550/arXiv.2011.01808